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Abstract 

Sex determination mechanisms are highly variable across teleost fishes and sexual development is often plastic. 
Nevertheless, downstream factors establishing the two sexes are presumably conserved. Here, we study sequence evolution 
and gene expression of core genes of sexual development in a prime model system in evolutionary biology, the East African 
cichlid fishes. Using the available five cichlid genomes, we test for signs of positive selection in 28 genes including duplicates 
from the teleost whole-genome duplication, and examine the expression of these candidate genes in three cichlid species. 
We then focus on a particularly striking case, the A- and B-copies of the aromatase cyp19a1, and detect different evolu- 
tionary trajectories: cyp19a1A evolved under strong positive selection, whereas cyp19a1B remained conserved at the 
protein level, yet is subject to regulatory changes at its transcription start sites. Importantly, we find shifts in gene 
expression in both copies. Cyp19a1 is considered the most conserved ovary-factor in vertebrates, and in all teleosts 
investigated so far, cyp19a1A and cyp19a1B are expressed in ovaries and the brain, respectively. This is not the case in 
cichlids, where we find new expression patterns in two derived lineages: the A-copy gained a novel testis-function in the 
Ectodine lineage, whereas the B-copy is overexpressed in the testis of the speciest-richest cichlid group, the Haplochromini. 
This suggests that even key factors of sexual development, including the sex steroid pathway, are not conserved in fish, 
supporting the idea that flexibility in sexual determination and differentiation may be a driving force of speciation. 
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Introduction 

Sexual development in animals involves the steps of sex 
determination, the initial decision to develop either ovaries 
or testes, and sex differentiation, the subsequent develop- 
mental program establishing the male and female phenotype 
through the action of steroid hormones. Interestingly, the 
master regulatory triggers of sex determination are not con- 
served throughout the animal kingdom, and sex differentia- 
tion can be rather plastic (Cutting et al. 2013). This is 
particularly exemplified by teleost fishes, in which all sorts 
of natural hermaphroditism occur and sex changes can be 
induced through a variety of external factors (Godwin 2010, 
and references therein). 

Contrary to the observed variation in the initial triggers of 
sex determination, the downstream genetic factors of sexual 
development are thought to be conserved between species 
because they are part of complex genetic networks with often 
pleiotropic effects. Their modifications presumably have 
deleterious effects, whereas changes "at the top of the hier- 
archies" are apparently better tolerated (Marin and Baker 
1998). Previous studies have thus mainly focused on sex 
differentiation and the ubiquitous downstream genes 
(Valenzuela et al. 2013). Of particular interest here are the 
presumably conserved genetic and enzymatic network regu- 
lating the production of sex steroid hormones in vertebrates, 
where estrogens mediate ovarian development and 



androgens testis differentiation. For example, the action of 
cypl9al, which aromatizes androgens into estrogens, is con- 
sidered a general feature of ovarian development in fish, am- 
phibians, reptiles, and mammals (for a review see Nakamura 
2010), whereas sox9 appears to be its major counterpart in 
testis development (Valenzuela et al. 2013). 

Sex-determining systems in teleosts, the taxonomically 
largest group of vertebrates, changed frequently during evo- 
lution and vary even between closely related species (Mank 
et al. 2006). So far, five male master sex-determining genes 
(i.e., equivalents to the mammalian sry) have been identified: 
DMY/dmrtlbY in medaka (Oryzias latipes) and its sister spe- 
cies O. curv'motus (Matsuda et al. 2002, 2003; Nanda et al. 
2002); Gsdf, a member of the transforming growth factor- 
beta superfamily, in O. luzonensis (Myosho et al. 2012); sdY in 
the rainbow trout (Yano et al. 2012); amhy, (Y-chromosomal 
copy of the ant'hMullerian hormone) in the Patagonian pejer- 
rey (Odontesthes hatcher]) (Hattori et al. 2012); and in three 
Takifugu species (T. rubripes, T. pardalis, and T. poecilonotus) 
an association between male sex and a (proto-)Y-chromo- 
somal single nucleotide polymorphism in the amh receptor 
type 11 (amhr2) was found (Kamiya et al. 2012). Four of these 
genes, DMY/dmrtlbY, Gsdf, amhy and amhrl, exemplify the 
scenario that genes already implicated downstream in a net- 
work (such as the genetic cascade of sex determination) are 
more readily uprecruited to the top of the hierarchy than 
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nonimplicated genes — a hypothesis originally proposed by 
Wilkins (1995). Sex determination and differentiation have 
also been studied in various other model fish species including 
zebrafish (Anderson et al. 2012), the poeciliid fishes guppy 
(Tripathi et al. 2009) and platyfish (Bonne et al. 2009), with a 
main focus on understanding and characterizing the genetic 
control over the establishment of the two sexes. Also, the 
downstream factors or particular genetic pathways of, for 
example, gonad development have been studied in several 
species (for review see Nakamura 2010; Angelopoulou et al. 
2012). Another field of research that has received much 
attention is sex change in fish and, particularly, the identifi- 
cation of hormones and chemical components (including 
pollutants) that influence it (for review see Scholz and 
Kliiver 2009). Thus, a considerable body of literature on sex 
determination and differentiation in fish is available, including 
data on many candidate genes. 

Cichlid fishes of the East African Great Lakes Malawi (LM), 
Victoria (LV), and Tanganyika (LT) are a prime model system 
in evolutionary biology and provide an exceptional opportu- 
nity to study organismal diversification (Kocher 2004). Each of 
the Great Lakes harbors cichlid species assemblages counting 
hundreds of endemic species, which evolved within a few 
millions of years (LT; Salzburger et al. 2002), to less than 
150,000 years (LV; Verheyen et al. 2003) only. This opens up 
a unique possibility for comparative studies at different evo- 
lutionary time scales, involving repeated patterns of diversifi- 
cation. As for the majority of fish species, the triggers of sex 
determination in cichlids are largely unknown. Yet, it 
becomes clear that also in this group various mechanisms 
exist, including genetic systems and environmental triggers 
such as water pH and temperature (Baroiller 2009; Reddon 
and Hurd 2013). The known genetic factors in cichlids in- 
clude, for example, sex determination via B-chromosomes 
(small supernumerary/accessory chromosomes, Yoshida 
et al. 2011) and male and female heterogametic sex chromo- 
some systems, with the possibility of both systems co-existing 
within a single species (Roberts et al. 2009). The best-studied 
cichlid species with respect to sexual development is the Nile 
tilapia (Oreochromis niloticus), a member of a more basal 
lineage, widely distributed in rivers and lakes of Africa. The 
Nile tilapia has an XX-XY sex-determining system, which can 
substantially be influenced by temperature (Baroiller 2009). 
For this species, expression profiles of key genes of sexual 
development are available (Ijiri et al. 2008), which is not the 
case for other cichlid species such as the radiating lineages in 
East Africa. The only exception to some extent is the LT 
cichlid Astatotilapia burtoni, in which the relation between 
hormones and behavior has been examined (Renn et al. 
2012). To the best of our knowledge, sex determination 
mechanisms and sex differentiation have not been studied 
in a phylogenetically representative set of East African lake 
cichlids. To this end, it is particularly important to focus on 
the cichlid assemblage of LT, which is the oldest of the three 
lakes and, hence, home to the genetically, morphologically 
and ecologically most diverse cichlid species flock 
(Salzburger et al. 2002). LT's 250 mostly endemic species 
(Snoeks 2001) are currently classified into 12 to 16 lineages, 



so-called tribes (Muschick et al. 2012). The taxonomic situa- 
tion is quite different in LM and LV, where the adaptive ra- 
diations consist of only one of these tribes, the Haplochromini 
(Salzburger et al. 2002). 

Genes involved in sex determination (including its master 
regulators), differentiation, and reproduction often evolve 
under a positive selection regime (Sorhannus and 
Kosakovsky Pond 2006; Hasselmann et al. 2008; Morgan 
et al. 2010; Sobrinho and de Brito 2010). This has been ex- 
plained by selection for reproductive isolation (which may 
ultimately lead to speciation, Vacquier and Swanson 2011), 
intersexual (genomic) conflict (Rice and Holland 1997), sperm 
competition (Pizzari and Birkhead 2002), and/or male-female 
coevolution (Swanson et al. 2003). This general trend seems 
to hold true also for cichlids, in which previous scans for 
positive selection uncovered three genes with putative roles 
in sexual development and reproduction: an aromatase gene 
(cyp 7 9a 7 A, Gerrard and Meyer 2007), a gene with a gonad- 
specific expression profile (SPP120, Gerrard and Meyer 2007), 
and a meiotic maturation factor (20-beta~hydroxysteroid de- 
hydrogenase, Baldo et al. 2011). 

Here, we investigate 28 candidate genes in detail in cichlids. 
The genes were recovered from a thorough literature survey 
focusing on genes involved in sex determination and differ- 
entiation in vertebrates, and, especially in teleosts (see sup- 
plementary table S1 for details, Supplementary Material 
online). These candidates include gene copies that have 
emerged in the course of the fish-specific whole-genome du- 
plication (FSGD; Meyer and Schartl 1999). To our knowledge, 
this has not been done before for genes involved in sexual 
development, although it seems crucial as gene duplications 
open the routes to partitioning gene functions and to neo- 
functionalization. We first investigate the candidate genes for 
sequence evolution using the available cichlid genomes (in- 
cluding species from all three lakes) and study their expression 
in ovary, testis and brain tissue in three species from LT 
(A. burtoni, Ophthalmotilapia centralis, and Neolamprologus 
pulcher), which were chosen with respect to available genetic 
and genomic resources (Baldo et al. 2011 and BROAD; www. 
broadinstitute.org/models/tilapia, last accessed July 23, 2013). 
Sequence data analysis of key genes are then extended to a 
phylogenetically representative set of 32 species belonging to 
13 different tribes and the expression analysis is extended 
to nine East African cichlid species, representing four tribes, 
including species from all three lakes. 

Results 

Sequence Evolution of Candidate Genes: Signs of 
Positive Selection in East African Cichlids 
To test for signs of positive selection in our set of 28 candidate 
genes, we used the annotated sequences of the Nile tilapia 
genome as reference to recover full-length coding sequences 
from all available cichlid genomes (A. burtoni and N. brichardi 
from LT, Metriaclima zebra from LM, and Pundamilia nyererei 
from LV; BROAD institute, see supplementary table S2 for 
details, Supplementary Material online). Using tilapia as out- 
group, we tested for signs of positive selection specific to the 
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East African cichlid radiations. We calculated mean gene wide 
Ka/Ks (nonsynonymous to synonymous substitution rates; 
see supplementary table S3 [Supplementary Material 
online] and Material and Methods for details) for all pairwise 
comparisons. The two genes with the highest gene-wide Ka/ 
Ks values were cypl9a1A, as also detected by Gerrard and 
Meyer in an expressed sequence tag (EST) wide screen 
(Gerrard and Meyer 2007; discussed earlier), and nanoslA, 
which has not been studied in this context in cichlids so 
far. Note that a role for a nanos family member in germ cell 
differentiation has been proposed in tilapia (Kobayashi 2010). 

One drawback with gene-wide Ka/Ks analyses is that they 
might miss signals of adaptive sequence evolution that are 
restricted to certain parts of a gene only; it is generally unlikely 
that all positions of a given gene experience positive selection 
at the same time, and many cases are known where positive 
selection is restricted to particular domains of a gene 
(Salzburger et al. 2007). To account for this, we analyzed 
our gene set using Selecton (Doron-Faigenboim et al. 2005; 
Stern et al. 2007) to test for site-wise selection, again using Nile 
tilapia as reference. We detected positively selected sites in 
the following genes: cypl9alA (in 60 amino acid positions out 
of 511),/ox/2B (9 of 268), sox9A (1 of 500), sox9B (7 of 484), 
tdrdl (96 of 1,164), and wnt4A (2 of 352). The results for 
cypl9alA and tdrdl remained significant after comparison 
with the alternative hypotheses (i.e., no positive selection, see 
supplementary fig. S1 [Supplementary Material online] for 
details). 

Sequence Evolution of Cyp19a1 in Cichlids 
As the initial analyses based on the five cichlid genomes 
pointed out new patterns of sequence evolution for 
cypl9alA, whereas we could not detect a single site under 
positive selection in the B-copy (cypl9alB; supplementary 
fig. S1, Supplementary Material online), we decided to further 
study these two genes in East African cichlids. Overall, 
cypl9alB is more conserved among the five cichlid genomes: 
Cyp19a1A differs in 12.2% (average over all pairwise compar- 
isons to tilapia, done with KaKs_calculator, Zhang et al. 2006) 
of all amino acid sites (3% synonymous changes and 9.2% 
nonsynonymous changes), whereas Cyp19a1B varies in 9.8% 
of all amino acid sites (4.8% synonymous changes and 
5% nonsynonymous changes). Especially, nonsynonymous 
substitutions occur more often in Cyp19a1A compared to 
Cyp19a1B. 

We then extended our analyses to a more representative 
set of East African cichlids. To this end, we used Ion Torrent 
next-generation amplicon sequencing (LifeTechnologies) to 
re-sequence a 3.6 kb genomic fragment of cypl9alA in 28 
cichlid species belonging to 12 different tribes (see Materials 
and Methods and supplementary table S4 [Supplementary 
Material online], GENBANK accession numbers KC684559- 
KC684586). The cypl9alA coding sequences of all 28 newly 
sequenced species plus the sequences obtained from the cich- 
lid genomes (O. niloticus as reference, N. pulcher, A. burtoni, 
and P. nyererei) were subjected to tests for positive selection 
as described earlier. The overall gene-wide Ka/Ks of 0.91 



resulting from this analysis was similar to the value obtained 
with the limited data set. However, when checking in the 
comprehensive data set for site-wise selection with 
Selecton, we detected an additional 50 amino acid sites 
(adding up to a total of 108 out of 511 sites) under positive 
selection. All but three of the sites suggested to have evolved 
under positive selection in the initial analysis were confirmed 
by this second, more extensive analysis. 

Placing the sites under a positive selection regime into the 
structural context of the well-studied aromatase protein 
(fig. 1) revealed that they do not cluster in a specific region 
of the coding sequence but are rather distributed over the 
entire protein and also occur in major functional domains. 
Additionally, the observed amino acid changes (marked with 
arrows above the strongest selected sites, fig. 1 ) are mostly not 
conservative. This suggests that cypl9a1A has experienced 
important changes in its functionality in East African cichlids. 

To obtain a more detailed understanding of cypl9alA 
sequence evolution in East African cichlids, we investigated 
positive selection in the phylogenetic context using 
HyPhy (Pond et al. 2005) and Codeml (Yang 2007) (fig. 2 
and table 1). 

These analyses uncovered several branches in the phylog- 
eny with particularly strong adaptive sequence evolution. 
One of these branches represents the ancestor of the adaptive 
radiation of cichlids in the East African Great lakes (branch 
after split from the outgroup O. niloticus; light gray box, fig. 2). 
The signal of positive selection gets even stronger when the 
basal Tanganyikan tribes Bathybatini, Trematocarini, and 
Boulengerochromini are excluded (dark gray box, fig. 2). A 
strong signal of positive selection was also found in the lineage 
leading to the taxonomically largest cichlid tribe, the 
Haplochromini (dark green box, fig. 2, remember that all cich- 
lids of LV and LM belong to this group). The so detected 
amino acid positions are consistent with the ones shown in 
figure 1 and fall into the two first categories of strongest 
positive selection of Selecton (fig. 2 and table 1). 

Comparative Sequence Analysis of Cyp19a1A and B: 
Transcription Start Site Variation and Promoter 
Evolution 

Using cichlid transcriptome databases (access over BROAD 
cichlid genome consortium, assembled RNA-seq data for the 
species with sequenced genomes), we investigated the tran- 
scription start sites (TSSs) of both cyp19a1 gene copies. 
Cypl9a1 transcripts were found in transcriptome databases 
derived from brain, gonad and mixed embryonic tissue. 
Interestingly, there are substantial differences between the 
two genes: In cypl9alA, the TSSs of the three presumably 
functional transcripts (i.e., transcripts comprising the start 
codon marked with an orange box in supplementary fig. S2, 
Supplementary Material online) found in O. niloticus, A. bur- 
toni, and A/I. zebra are 20 nucleotides apart of each other. 
Quite to the contrary, the TSSs of cypl9alB are located 
more than 1,200 bp apart from each other in the transcripts 
found for A. burtoni, P. nyererei, and A/1, zebra. A closer in- 
spection revealed two alternative splice forms of cypl9alB: 
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Fig. 1. Placement of site-specific selection on the protein sequence of Cyp19a1A. Selection values shown were calculated on a multispecies alignment 
covering the LT cichlid assemblage using Selecton. Structural information on the Cyp19a1 A protein was taken from (Graham-Lorence et al. 1995; Chiang 
et al. 2001; Castro et al. 2005). Occurring amino acid changes are indicated above the sites under strongest positive selection as detected with Selecton 
(category 1). 



transcription of cypl9alB in cichlids can start inside intron 1 
or mark the beginning of the untranslated exon 1, in which 
case intron 1 is correctly spliced out (see supplementary fig. S2 
[Supplementary Material online] for the intron location). 
For the two basal species O. niloticus and N. brichardi, we 
only detected the splice variant with the TSS inside exon 1. 

Regulation of Cyp19a1A and B Expression in 
(Cichlid) Fish 

The cypl9alA and B promoter regions have previously been 
studied in several teleost species (Callard et al. 2001; Diotel 
et al. 2010) just as the promoter of the single copy cyp 7 9a 7 
gene in tetrapods (Hinshelwood et al. 2000; Nakagawa and 
Iwabuchi 2012). In a first step, to characterize general patterns 
in the molecular evolution of the cypl9al upstream regions in 
teleosts, we compared the existing cichlid sequences with all 
other available teleost aromatase promoters (fig. 3 and 
table 2). Note that cypl9alB is absent from the available 
/VI. zebra (LM) genome assembly version 1, so that we 



excluded this species in further analysis. Using Vista plots of 
nucleotide similarity (Mayor et al. 2000; Frazer et al. 2004), we 
identified three conserved regions in the upstream sequences 
of both gene copies marked green, yellow, and brown in figure 
3. Here, we define the upstream region as 5'-sequence be- 
tween the start codon of the cyp 7 9a 7 A and B genes and 
their respective adjacent gene. For cyp 7 9a 7 A, we analyzed 
the 3,394 bp until its neighboring gene gliomedinA (gldnA); 
for cypl9alB, we analyzed 2,571 bp to gldnB. We inspected 
the three conserved regions with regard to common tran- 
scription factor binding sites focusing on factors with a 
known function/response role in brain/central nervous 
system (CNS) and gonads/germ cells, as these are the tissues 
in which the aromatase genes are described to be expressed. 
Although remarkable differences in the expression patterns of 
cypl9alA and B exist between teleosts, and especially be- 
tween cichlids (discussed later), the cypl9al promoters still 
show conserved regions between all teleosts, underlining their 
general functional importance. 
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Fig. 2. Ka/Ks rates on a phylogenetic tree of aromatase cypl9alA in cichlids. Phylogenetic tree of aromatase cypl9alA of representatives of the major 
East African cichlid lineages showing the Ka/Ks rates reconstructed with HyPhy and branch-site selection as calculated with codeml. Nodes are labeled 
with Bayesian posterior probabilities as obtained with MrBayes. Colored boxes denote clades that were tested against all other branches with codeml 
(see Materials and Methods for details). Significant obtained Ka/Ks values (co) are shown next to the boxes in the respective colors, co values denoted in 
blue were calculated with Branch-site REL in HyPhy. Table 1 presents the sites that are under positive selection as detected by codeml in the different 
clades that show a significant co with a probability (BEB) of more than 50%. The values obtained with Selecton are given in addition. 



Putative transcription factor binding sites inside the 
conserved regions and shared between teleosts (underlined 
text in table 2) comprise factors belonging to the two inves- 
tigated categories (brain/CNS and gonads/germ cells). None 
of these categories seems to dominate, however, neither in 
cypl9alA nor in cypl9alB. 



Putative Transcription Factor Binding Sites in Teleost 
Conserved Upstream Regions of Cyp19a1A and B 
Using Mat Inspector (Genomatix Software GmbH), we iden- 
tified putative estrogen response elements (EREs) in the 
upstream regions of both cypl9al duplicates, which is con- 
sistent with previous studies in fish (Diotel et al. 2010). In the 
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Table 1. Cyp19a1A Amino Acid Sites under Positive Selection in Different Phylogenetic Clades of East African Cichlids. 



Cyp19a1A Amino 


Category of 


BEB of Positive Selection in Clades with Significant co b as 


Tested with Codeml 


Acid Position in 

(*\ ni \e\t\ri ic 
\J. I IIIUULUb 
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0.955 
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2 
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0.81 


0.852 


123 


1 








0.624 


157 


1 


0.563 


0.569 






169 


1 


0.646 








191 


1 


0.998 


0.998 


0.885 




205 


1 


0.978 


0.98 


0.535 


0.628 


257 


1 


0.858 








314 


1 






0.772 


0.831 


362 


1 


0.816 


0.827 


0.621 


0.696 


370 


1 


0.777 


0.813 






461 


1 


0.859 




0.659 


0.721 


471 


1 


0.979 


0.98 


0.944 


0.956 


507 


1 


0.628 









Note. — BEB, Bayes Empirical Bayes. 

a 1 and 2 represent the two categories of strongest positive selection as detected with Selecton corresponding to Ka/Ks values over 2.7 for category 1 
and 1-2.7 for category 2 (fig. 1). 
b a) = Ka/Ks. 

c The phylogenetic categories (1, light gray; 2, dark gray; 3, dark green; 4, light green) correspond to the ones shown in figure 2. 



first teleost-conserved block upstream of cypl9alA (green 
box, fig. 3), we also detected putative Nr5a2 (Pezzi et al. 

2004) and Fox-family binding sites (Wang et al. 2010) 
shared between teleosts. Here, we also detected partially con- 
served sites belonging to the category "brain/CNS transcrip- 
tion factors/' such as neuronal factors of the Oct family 
including Oct-6 and the transcriptional repressor DREAM. 
Oct-6 is known to be expressed in brain and testis, to play 
a role in myelination and to act as an RNA polymerase II distal 
enhancer (Hofmann et al. 2010). DREAM plays a role in noci- 
ception in the brain and is broadly expressed in testis (Rivas 
et al. 2004). 

Tilapia, platyfish, and medaka contain, in the first block, a 
potential site for Nr1f1, which is important for the develop- 
ment of the cerebellum, regulation of (lipid) metabolism, 
lymphocyte development, inflammatory responses, and pos- 
sibly myogenesis or muscle function (Jetten 2009). Finally, we 
detected putative CREB (cAMP response element-binding 
protein)-binding sites in a subset of species. CREB can act 
as transcriptional activator or repressor and acts in neurogen- 
esis, neuronal survival and plasticity (Barco and Marie 2011; 
Gruart et al. 2012). Note that CREB binding sites have been 
described in the cypl9alA promoter before (Chang et al. 

2005) . Finally, we detected shared putative Sox-family binding 
sites in the second conserved block (Callard et al. 2001). 

Interestingly, all three conserved blocks of the cypl9alB 
teleost promoter contain putative homeobox domain tran- 
scription factor binding sites, including sites for members of 
the HOX family. Additionally, we detected possible binding 
sites for the Fox-gene-family in block one, which is also the 
case in the cypl9alA promoter (blocks 1 and 2). In three 
teleost species (platyfish, medaka, and stickleback), we 



found conserved putative sites for Hmx3 (involved in speci- 
fication of neuronal cell types, required in the hypothalamic- 
pituitary axis), RFX1 (transcriptional activator of MHC classll 
genes), and PCE-1 (an element usually found in promoter 
regions of photoreceptor genes). However, these sites were 
not detected in tilapia or pufferfishes. 

A closer inspection of the cichlid sequences only (supple- 
mentary fig. S2, Supplementary Material online) revealed that 
these contain, in the conserved upstream regions of both 
gene copies, putative binding sites for SF-1, several Sox- 
family members and several androgen response elements in 
addition to the ERE and estrogen receptor (ER)-related sites. 
The promoter region of cypl9alA in cichlids contains also 
putative binding sites for Nr5a2 and Dmrtl. Furthermore, we 
confirmed the existence of putative binding sites for Nkx6-1 
in both cypl9al promoters in cichlids, just as previously de- 
scribed for cypl9a1B in tilapia, zebrafish, and medaka (Chang 
et al. 2005;Diotel et al. 2010). 

Expression Profiling of Cyp19a1A and B and Other 
Sexual Development Genes in Cichlids 
In a first step, we tested by quantitative real-time polymerase 
chain reaction (qRT-PCR), the expression of our set of candi- 
date genes in three different species from LT (A. burtoni, O. 
centralis, and N. pulcher, for species choice see Materials and 
Methods) belonging to three different cichlid lineages. We 
obtained expression profiles in both gonad and brain tissue 
for 23 of these candidate genes (see Materials and Methods 
for details on our working procedure; the remaining five 
genes, amh, arA, arB, daxlb, and tdrdl could not successfully 
be amplified on cDNA from gonadal tissue). Brain tissue was 
included in these experiments, because several of the 
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Table 2. Putative Transcription Factor Binding Sites in the Teleost Promoter Regions of cypl9alA and B. 



cyp19a1A 





Block 1 a 


Nile tilapia 


NF1 Nr5a2 ERR Sox5 FoxF2 FAC1 OctF Nr1f1 DREAM NCN/NeuroD 


Platyfish 


MEL1 Hmx3 Nr5a2 ERR Nr5a2 ERR Sox5 FoxA1/2 Sry Oct~6 Nr1f1 DREAM CREB Mytll 


Medaka 


SF1 ERR SPZ1 Nr5a2 ERRalpha Sox5 FoxA1/2 FAC1 Oct-6 Nr1f1 DREAM 


Stickleback 


HMGI(Y) 


Fugu 


SPI1 Nr5a2 Sox5 XBP1 Dec-2 ARNT FoxA1/2 ATF2 CREB 


Pufferfish 


Sox5 HRE Dec-2 ARNT FoxF2 ATF2 CREB 




Block 2 


Nile tilapia 


Pou5f1 Sox5 API 


Platyfish 


Sox5 AP1 Hbox-F b FoxJ1 


Medaka 


Sox5 Hbox-F HRE 


Stickleback 


RFX1 FoxK2 


Pufferfish 


CREB 




Block 3 


Nile tilapia 


MEL1 Nanog MyF4 AscM 


Platyfish 


PLAG1 Meisl EvM HBP-1 


Stickleback 


SPI-1 CABP WT1 MEL1 Nr1d1 


cyp19a1B 




Block 1 


Nile tilapia 


Nkx2-5 RFX1/3 MyT1l HoxC9 FoxF1 Nkx2-5 HoxC9 FoxA1/2 Sox9 SPI-1 


Platyfish 


Hmx3 MyT1 RFX1 HoxC9 FoxMa HoxC9 SPI-1 


Medaka 


Hmx3 RFX1 Six4 HoxC9 Hmx2 HMGA1/2 Hox-F c 


Stickleback 


Hmx3 RFX1 FoxF1 MyF6 




Block 2 


Nile tilapia 


Hhex Msx-1/2 Brn-3 Nkx6-1 Nmp4 AP1 Sox2 ERE-F Hbox-F HMGA1/2 PLAG1 


Platyfish 


Sox30 SPZ1 TGIF1 ERE-F Hbox-F PCE-1 SPM 


Medaka 


MEL1 Etv4 TGIF1 ERE-F S8 Hbox-F PCE-1 ERR Nr5a2 


Stickleback 


ERE-F Hbox-F HMGA1/2 PCE-1 


Pufferfish 


ERE-F AP1 Hox9A 




Block 3 


Nile tilapia 


E2a AscM Meisl GSH-2 Phox2a/b SPM Gsh-1 Elf5 HoxC9 Sox-F MyT1l MyT1l 


Platyfish 


TGIF DREAM MEL1 


Medaka 


RFX1 ZIC2 NGN/NeuroD HRE Hbox-F 


Stickleback 


Elf5 MyT1l 



a Blocks correspond to the regions shown in figure 3. 

b Binding sites that are shared between at least two species are underlined. 

C F indicates that the binding site is predicted to bind several members of the same family. For the sake of space, not all family members are 
listed but the family is indicated by -F. 



candidate genes are putative members of the hypothalamic- 
pituitary-gonadal axis, especially those involved in sex steroid 
synthesis, and hence, are expected to be expressed in the brain. 
This expectation can also be made for the adult brain because 
the fish brain produces sex steroid hormones necessary for 
sexual differentiation, plasticity, and reproduction throughout 
the entire lifespan of a fish unlike the mammalian brain, which 
is irreversibly sexualized during early development. 

Table 3 shows a summary of the expression data for the 
three tested cichlid species (for box plots and detailed values 
of all expression tests and statistics see supplementary mate- 
rial S2, Supplementary Material online). Most of the tested 
gene candidates show — at least to some extent — expression 
patterns as described for other species (e.g., dmrtl is also a 
testis gene in cichlids). The tissue of highest expression level is 
most often conserved between the examined species. 



However, we also found deviations from described expression 
patterns. 

We cannot confirm a role for rspol as an ovary factor in 
adult cichlids, where it is mainly expressed in the brain. Rspol 
has been suggested to act as an activator of the wnt/beta- 
catenin pathway in the developing ovary (Smith et al. 2008) 
and is overexpressed in zebrafish ovaries (Zhang et al. 2011). 
Our data do, however, support a role for other members of 
this pathway, ctnnblA and B and wnt4a, in the adult ovary, 
whereas wnt4B, just like rspol, is predominantly expressed in 
the brain and testis; furthermore, ctnnblA seems to also have 
experienced a shift in expression toward the brain in adult 
cichlids. Therefore, the role of the wnt-pathway in ovary 
development seems to be less conserved than previously 
thought, and, in addition, to differ between paralogous 
gene copies. 
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In mammals, Sox9 is indispensable for testis differentiation 
(Kanai et al. 2005) and sox9 is the first gene to show male 
sex-specific expression in many species (for a review see 
Morais da Silva et al. 1996). In fish, both sox9 gene copies 
were found to be (highly) expressed in developing and adult 
testis (Zhou et al. 2003; Johnsen et al. 2013). In our set of 
cichlids, sox9A seems to be important in brain tissue and 
sox9B in ovaries and to some extent in the brain. 

The most intriguing expression patterns observed in our 
data set involved once more the aromatase gene cypl9al. 
Although the brain expression patterns of cypl9alB in two of 
our study species (N. pulcher and O. centralis) are consistent 
with the general trend in teleosts, that is, expression of the 
A-copy in ovaries and the B-copy in the brain (Callard et al. 
2001; Cao et al. 2012), the member of the derived and excep- 
tionally species-rich haplochromine cichlids (A burtoni) 
shows a different pattern. In A burtoni, cypl9alB is overex- 
pressed in testis and not in the brain, a pattern so far not 
described. Also cypl9alA shows a deviation from the general 
trend in teleosts, in this case in the Ectodini O. centralis, where 
we detected a similar expression level in testis and in ovaries 
(the other two cichlids show overexpression in ovaries, like 
other teleosts). 

Because of the observed new expression patterns and the 
strong signal for positive selected sites in cypl9alA, we de- 
cided to study both gene copies in more detail and more 
species focusing on a phylogenetically representative set of 
the East African cichlids. To this end, we performed qRT-PCR 
expression analysis in adults of six additional cichlid species: a 
representative of an extra cichlid tribe (Perissodus microlepis, 
tribe Perissodini) as well as more representatives of the 
Ectodini (Enantiopus melanogenys), the Lamprologini (A/to- 
lamprologus fasciatus), and, particularly, the Haplochromini 
(Ctenochromis horei, Pseudotropheus pulpican, and Haplo- 
chromis obliquidens). Thus, we now also included members 
of the two major haplochromine adaptive radiations, that of 
lakes Victoria (H. obliquidens) and Malawi (P. pulpican). The 
results of these experiments are shown in figure 4 (for further 
details on values and statistics see supplementary material S3, 
Supplementary Material online). 

First, we confirmed that the newly detected overexpres- 
sion of cypl9alB in testis of A burtoni is representative for 
haplochromine cichlids in general, indicating that this shift in 
gene expression has been acquired in the ancestor of the 
most species-rich group of cichlids. 

Second, the expression of cypl9alA in testis in O. centralis 
also seems to be a more general pattern in Ectodini, as evi- 
denced by the analysis of a second representative of this tribe, 
£ melanogenys. The expression patterns observed in 
Lamprologini and Perissodini, however, resemble the presum- 
ably ancestral patterns known from other teleosts. 

Discussion 

In this study, we investigated supposedly conserved candidate 
genes implicated in sex determination, sex differentiation, and 
reproduction in East African cichlid fishes. We found substan- 
tial variation in adult expression patterns for several of these 
genes, suggesting that not only the master initial regulators 



but also the downstream genetic factors of sex determination 
are less conserved than previously thought. We could show 
that this variation depends on the species investigated, and 
that expression can vary even between closely related species. 
Furthermore, variation in sequence evolution and expression 
can act on the level of additional gene copies such as the ones 
that emerged from the FSGD (e.g., see our data on sox9A and 
B and wnt4A and B), opening the path for sub- and neo- 
functionalization. This suggests that not only the master reg- 
ulatory triggers of sex determination are labile in the animal 
kingdom but also that the downstream genetic factors can 
experience substantial changes in coding sequence and func- 
tion. This adds a further level of plasticity to the processes of 
sexual development, at least in teleost fish. 

Aromatase Gene Expression in Cichlids 
The aromatase cypl9al is considered one of the most impor- 
tant, if not the key gene in vertebrate sexual development 
(Diotel et al. 2010; Valenzuela et al. 2013). Cyp 7 9a 7 is the 
sensitive gene that reacts to temperature changes and thus 
induces sexual development in reptiles (Merchant-Larios and 
Diaz-Hernandez 2013), it has been implicated with sex change 
in fish (Liu et al. 2009; Nozu et al. 2009) and it is one of the first 
genes to show sex differences in expression in developing 
tilapia (Ijiri et al. 2008). Cypl9al is a fundamental component 
of the estrogen pathway and it has been suggested that hor- 
monal actions, such as the ones of estrogen, are so important 
that sex determination relies on them exclusively, regardless 
of their initial activation (Angelopoulou et al. 2012; Merchant- 
Larios and Diaz-Hernandez 2013). 

In adult vertebrates, cypl9al typically comes in two fla- 
vors, the brain and the ovary aromatase. In tetrapods, this 
target specificity is achieved through the generation of alter- 
native S'-UTR transcripts derived from a single gene, medi- 
ated by tissue-specific promoters (Toda and Shizuta 1993; 
Golovine et al. 2003); an exception is the pig, which has 
three copies of cyp 7 9a 7 (Graddy et al. 2000). Teleosts, on 
the other hand, have two copies of the gene generated by 
the FSGD and generally express the A-copy in ovaries and the 
B-copy in the brain (an exception are the European and the 
Japanese eels that only retained one copy after the FSGD, 
which is expressed in both, the brain and gonads, Tzchori 
et al. 2004; Jeng et al. 2012). The high expression levels in 
the fish brain, especially of cyp 7 9a 7 B, are explained by its 
regulation through a feedback loop of estrogen on EREs in 
its promoter (Callard et al. 2001; Diotel et al. 2010, and refer- 
ences therein). In general, the functions of cypl9a1A and 
cypl9alB have been suggested to be highly conserved 
among teleosts (Callard et al. 2001). 

Here, we show that East African cichlid fishes deviate from 
this trend. Cypl9alA, which shows strong signs of positive 
selection in cichlids, has experienced a shift in expression from 
ovary-specific to testis and ovaries in the Ectodini of LT. This 
pattern clearly contradicts the general assumption that an 
upregulation of cypl9al A is needed to initiate and maintain 
ovary function in fish, whereas its downregulation induces 
testis development (Guiguen et al. 2010). Furthermore, the 
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Fig. 4. qRT-PCR expression data for aromatase cypl9alA and B in different East African cichlid species. qRT-PCR expression data for cypl9alA and 
cypl9alB are shown as mean normalized expression (MNE, calculated as described in Simon 2003) on a logarithmic scale tested in male and female 
brain tissue, testis, and ovary in species representing four major cichlid lineages (Haplochromini, Ectodini, Perissodini, and Lamprologini). Pseudotropheus 
pulpican is a species of Lake Malawi and Haplochromis obliquidens of Lake Victoria. All other species are from LT. Sample size is five for each tissue and 
sex except for Perissodus microlepis (only four males available and six females) and Altolamprologus fasciatus (only four males available). For details on 
values, raw data, and statistics see supplementary material S3, Supplementary Material online. 



protein domains of Cyp19a1a could be substantially influ- 
enced by the changes occurring in its coding sequence in 
East African cichlids and hence its functionality in the estro- 
gen pathway remains to be determined in these fish. 

Cypl9a1B, on the other hand, shows high sequence con- 
servation in the investigated cichlids suggesting that it pre- 
served its function in testosterone aromatization to estrogen. 
Still, this gene has experienced a shift in expression from brain 
to testis in the Haplochromini. It thus appears that in this 
most species-rich group of cichlids the main site of adult 
estrogen production is the testis and not the brain. The hap- 
lochromines are the cichlid lineage displaying the most pro- 
nounced sexual color dimorphism, which has often been 
implicated with sexual selection via female choice, and they 



show the most derived female mouthbrooding behavior 
(Salzburger et al. 2005). Whether the unprecedented recruit- 
ment of the otherwise brain-specific gene cypl9alB in testis is 
causally linked to some of these features of haplochromine 
cichlids would need to be investigated in the future. 

Adult ectodine cichlids mainly express cypl9a1A in the 
gonads and haplochromine cichlids overexpress cypl9a1B 
in the testis. This should lead, at least in the haplochromines, 
to high estrogen levels in the testis, whereas the functionality 
of Cyp19a1A in estrogen aromatization to testosterone can 
be questioned. 

High estrogen levels in the haplochromine cichlid A. bur- 
toni and in many other vertebrates are linked to aggressive 
behavior (Trainor et al. 2006); however, the steroids effecting 
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aggressive (but not reproductive) behavior are apparently not 
of gonadal origin in A. burtoni (Soma et al. 1996; Huffman 
et al. 2013), leaving the function of high aromatase activity in 
the testis of this species unexplained. Yet, none of the stud- 
ies on estrogen levels in A. burtoni is specific for one of 
the two cyp 7 9a 7 copies (Huffman et al. 2013). 
Pharmacologically blocking aromatase (presumably both 
forms), decreases aggressive behavior, whereas reproductive 
behavior is not influenced (Huffman et al. 2013, and refer- 
ences therein). We suggest further critical investigation of the 
usage of such inhibiting substances with respect to the se- 
quence changes that we observed in cyp 7 9a 7 A. 

Estrogen production via aromatase in the testis of fish has 
been described to be involved in spermatogenesis, where 
estrogen is necessary for the renewal of spermatagonial 
stem cells and thought to be necessary for germ cell prolifer- 
ation and possibly differentiation (Schulz et al. 2010). Yet, 
overexpression of aromatase in testis compared with ovary 
or brain has not been described in fish before (see table 2 in 
Piferrer and Blazquez 2005). Furthermore, an increase of 
estrogen levels has been shown to impair testis function in 
fish and substantially influence reproduction (Kobayashi et al. 
2011), leaving the high expression levels of aromatase, and 
especially cypl9alB in haplochromine cichlids, hard to explain 
with the currently available data on estrogen actions on 
testis/spermiogenesis. 

Considering our analysis of the promoter regions of 
cypl9alA and B in cichlids, we think that the presence of 
SF-1 and Dmrt-1 binding sites in the cypl9alA promoter and 
the expression of both factors in the testis of cichlids (fig. 3 
and table 3) is consistent with a repressive action of these 
two transcription factors on cypl9alA expression in testis. 
Conversely, cypl9alB is overexpressed in testis in haplochro- 
mines, suggesting that it may not be suppressed by SF-1 in 
testis, though the cichlid cypl9alB promoter contains a pu- 
tative binding site for this factor and sf-7 is overexpressed in 
testis. The main expression of nrSal in the brain of A. burtoni 
contradicts its activating action on cyp19alA in gonads, 
although there is a conserved potential binding site for it in 
the A promoter. Still, Nr5a2 could drive cypl9alA expression 
in the brain or earlier during development in the gonads. 

Further studies should unravel the functionality of the two 
aromatase copies in cichlids, their function throughout brain 
and gonad development and their action on adult steroid 
production involved in aggression and reproduction. Also 
the regulation of their expression by neuronal/gonadal tran- 
scription factors as well as through estrogens and androgens 
(as suggested by the presence of response elements for both 
steroid types in the cichlid promoter regions of cypl9alA 
and B) need further investigation. 

Contrasting Sequence Evolution Trajectories of the 
Two Aromatase Gene Copies in Cichlids 
The relative contribution to phenotypic divergence of 
changes in the c/s-regulatory elements of a gene versus 
changes in the coding sequence is still nebulous and a 
matter of debate (Hoekstra and Coyne 2007), although it 



has been argued that rapid evolution in closely related taxa 
such as cichlids is more likely to be mediated by regulatory 
changes (Baldo et al. 2011). Here, we show that even two 
copies of the same ancestral gene emerging from a duplica- 
tion event may pursue these distinct evolutionary trajectories, 
both likely leading to functional divergence. Cyp19a1A has 
undergone various, presumably adaptive changes in its amino 
acid sequence during cichlid evolution that might substan- 
tially influence its function. Cypl9a1B, on the other hand, 
seems to be under strong purifying selection in cichlids, indi- 
cating that functional constraints act on at least one of the 
duplicate gene copies, though the effects of aromatase action 
remain illusive. Still, also the B-copy shows new evolutionary 
paths in cichlids: in addition to expression shift in 
Haplochromini (overexpression in testis) we could show the 
usage of different TSSs in Ectodini and Haplochromini, a 
feature already described for the single copy cyp 7 9a 7 gene 
in tetrapods (mentioned earlier). This shift seems to be 
caused by cis or trans regulatory changes. The other, more 
basal cichlids investigated here show conserved expression 
patterns in both genes resembling other teleosts. This illus- 
trates that paralogs can be subjects to differential evolution- 
ary fates also at late, that is, derived stages of an evolutionary 
radiation. 

More Flexibility in the Sex-Determining Cascade: Sox9 
Expression in Cichlids — More than a Testis Factor 
In mammals, Sox9 drives testis differentiation (Kanai et al. 
2005) and a similar role has been suggested for this gene 
in other vertebrate groups, after the observation that it is 
often the first gene to show male sex-specific expression 
(Morais da Silva et al. 1996). In teleost fish, both sox9 
gene copies were described to be (highly) expressed in 
developing and adult testis (Zhou et al. 2003; Johnsen 
et al. 2013). Apparently, however, the expression pattern 
is more flexible in adult fish, it varies by species and 
depends on the gene copy investigated. In line with 
this, we show that sox9A seems to rather play a role in 
the brain and sox9B in ovaries and also to some extent in 
the brain. This illustrates that substantial differences exist 
in the expression pattern of sox9 between teleost fishes 
and underlines once more the importance of studying 
both duplicates of a single copy tetrapod gene, when 
present in fish, and not to conclude a function from 
one copy only. This is also illustrated by our data on 
wnt4A and B. Sox9A and B are obviously another example 
of a sub- or neofunctionalization of gene copies after du- 
plication (Ohno 1970). 

Sexual Development Is Plastic in Teleost Fishes 
It has been proposed that a set of genes forms a common 
regulatory network of sexual differentiation in vertebrates 
(Angelopoulou et al. 2012; Valenzuela et al. 2013). This 
study shows that some of the genes implicated in sexual 
development and reproduction are indeed conserved be- 
tween lineages. The function of these genes, however, as 
reflected by their varying expression patterns in different 
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species (this study and see Cutting et al. 2013 for a review) 
and changing coding sequence as we show here for the aro- 
matase gene copies, seems to be less conserved than previ- 
ously thought. 

Plasticity in sexual development has been studied in detail 
in sex-changing fish and species that possess temperature 
dependent sex-determining mechanisms, where gene expres- 
sion changes in response to temperature have been assessed. 
Most other available studies on sexual development have 
focused on one gene or a particular pathway of a few inter- 
acting genes. Our work was based on a broad screen of many 
genes and with respect to both, gene expression and 
sequence evolution, in a prime model system in evolutionary 
biology, the East African cichlids. We detected substantial 
differences in expression within different cichlid species and 
between cichlids, or certain lineages thereof, and other tele- 
osts. The most intriguing outcome of our study is the lineage- 
specific shifts in gene expression in the two copies of the 
aromatase cypl9al, a major core gene of sexual development, 
behavior, and reproduction in chordates. That the changes in 
gene expression in the two cypl9al copies occurred in 
derived cichlid lineages, but not in the ancestral tribes, sug- 
gests that functional divergence between two gene copies 
derived from the FSGD adds another level of flexibility to 
the sex-determining cascade of teleosts. 

Taken together, most of the studied genes are part of the 
cascade of sex determination, differentiation, and reproduc- 
tion across teleosts but their usage may differ substantially at 
different taxonomic levels. This can be interpreted as a devel- 
opmental system drift (DSD), a change in the molecular path- 
way with preservation of the morphological/phenotypic 
outcome (True and Haag 2001), in this case the establish- 
ment, maintenance, and functioning of separate sexes. 
Indeed, changes of master control genes in the sex-determin- 
ing cascade between closely related taxa have been called "the 
most striking examples of diversity in a conserved regulatory 
process" (True and Haag 2001), exemplifying DSD. Our data 
and also those of others (Wang and Sommer 201 1; Valenzuela 
et al. 2013) indicate that this diversity not only affects the top 
of the sex-determining cascade but also includes sexual dif- 
ferentiation genes, hormone pathways and the molecular 
networks necessary for the development of reproductive 
organs and behavior. The occurring modifications can thus 
affect all levels of a developmental process and may depend 
on cis or trans changes, novel wiring, new patterns of coding 
sequence evolution, different gene usage and de novo recruit- 
ment and evolution of genes. 

Especially the observed sequence and expression changes 
of the aromatase gene copies could have implications on the 
evolution of sex chromosomes, which might be driven by the 
action of hormones (Howard 2002). Our study once more 
demonstrates that understanding evolutionary shifts in 
sex-determining systems requires the characterization of 
downstream gene variability (Uller and Helantera 2011). 
Cichlid fishes are, in this respect, an ideal model system 
since many closely related, hence genetically very similar spe- 
cies, with different sex-determining mechanisms and all sorts 
of reproductive strategies exist. 



Materials and Methods 

Tissue Sampling and Nucleic Acid Extraction 
The presented data set comprises DNA sequence data for 33 
species of East African cichlid fishes (see supplementary table 
S4, Supplementary Material online) and expression data for 
nine species (A. burtoni, C horei, O. centralis, E. melanogenys, 
N. pulcher, Alt fasciatus, P. microlepis, H. obliquidens, and 
Pse. pulpican). The investigated species belong to 13 different 
tribes and thus cover a great portion of the taxonomic and 
phylogenetic diversity of East African cichlids (Salzburger et al. 
2002, 2005). The species that were tested for candidate gene 
expression were chosen with regards to available genetic re- 
sources and their respective phylogenetic positions. All can- 
didate genes were tested in A. burtoni (a member of the 
Haplochromini), O. centralis (Ectodini), and N. pulcher 
(Lamprologini). These three cichlid tribes differ remarkably 
in many life history aspects and with regards to sexual devel- 
opment and, especially, their reproductive strategies. A. bur- 
toni and O. centralis are maternal mouthbreeders, whereas N. 
pulcher is a bi-parental substrate spawner within a coopera- 
tive helper group. Additionally, A. burtoni shows a rather 
strong sexual color dimorphism. 

Samples of the species from LT were collected during field 
expeditions to the lake and surrounding rivers between 2007 
and 2012 under a research permit issued by the Lake 
Tanganyika Research Unit, Department of Fisheries, 
Mpulungu, Zambia. Tissue samples of N. callipterus were 
kindly provided by M. Taborsky. Adult and mature specimens 
for two additional species (H. obliquidens from Lake Victoria 
and P. pulpican from Lake Malawi) were derived from the 
aquarium trade. Life fish were kept under standard conditions 
(12 h light, 12 h dark, 25 °C) at the animal facility of the 
Salzburger group, University of Basel, Switzerland. Research 
was performed under the cantonal veterinary permit no. 
2317. 

All samples for the expression studies were collected 
during the above mentioned field expeditions in 2011/2012. 
Tissues (brain, gonad, gill, anal fin, and liver) were collected 
from wild caught adult fish within one hour to reduce gene 
expression shifts caused by stress. Tissues were directly stored 
in RNAIater (LifeTechnologies). DNA samples (muscle and fin 
tissue) were taken at the same time and stored in 100% 
ethanol. Fish were sexed based on external morphological 
traits and visual gonad inspection. Samples were only in- 
cluded in the study when ovary or testis structure could be 
determined unambiguously. For the mouth-breeding species 
only non-breeding females were dissected. Given the sam- 
pling conditions, reliable determination of GSI (gonadoso- 
matic index) or plasma steroid levels to infer reproductive 
status of the invested specimens was not feasible. Note how- 
ever, that previous studies on stable laboratory populations of 
male A. burtoni have shown that changes in gonad mass are 
not necessarily reflecting reproductive maturity but are rather 
signs of interstitial cell division (Huffman et al. 2012, and ref- 
erences therein). Additionally, testis of both subordinate and 
dominant males contains all spermatogenic stages (Maruska 
and Fernald 2011) and sperm proliferation does not differ 
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between them (Kustan et al. 2011). A reliable assessment of 
reproductive status and dominance over gene expression 
analysis is, however, possible (Huffman et al. 2012) and sex- 
specific expression in our data set is consistent between sam- 
ples of one sex. 

Genomic DNA was extracted from ethanol preserved 
muscle or fin tissue applying manually a Proteinase K diges- 
tion followed by sodium chloride extraction and ethanol pre- 
cipitation (Laird et al. 1991) or using the robotic workstation 
BioSprint 96 (Qiagen) following the manufacturer's 
instructions. 

Samples stored in RNAIater were incubated overnight in 
Trizol (LifeTechnologies) at 4°C (this initial incubation step 
increases RNA yield after long-term storage in RNAIater) prior 
to extraction and subsequently homogenized in Trizol using a 
FastPrep24 (MP Biomedicals Europe). Total RNA was 
extracted following the Trizol protocol. RNA concentration 
and quality was measured using a NanoDroplOOO 
Spectrophotometer (ThermoScientific). RNA samples were 
treated with DNA-free Kit (LifeTechnologies) as recom- 
mended by the manufacturer. 

Reverse Transcription and Quantitative 
Real-Time PCR 

DNase-treated RNA was reverse transcribed using the High 
capacity RNA-to-cDNA kit (LifeTechnologies) following the 
manufacturer's protocol. Real-time PCR experiments were 
run on a StepOnePlus Real-Time system (LifeTechnologies) 
on a final cDNA concentration of 0.5 ng/ul for the experiment 
shown in table 3 and 1 ng/|il for the experiment shown in 
figure 4 with 200 nM final primer concentration and the SYBR 
Green master (Rox) dye (Roche) in 20 |il final volume with 
the following cycling conditions: 95 °C 10 min, 40 cycles 95 °C 
10 s, and 58 °C 30 s for experiments shown in column 3 of 
table 3 and 95 °C 10 min, 40 cycles 95 °C 15 s, and 58 °C 60 s 
for all other experiments shown). All qRT-PCR amplifications 
included a melt curve step after cycling. Primers were con- 
structed in conserved regions chosen based on the A. burtoni 
and N. brichardi (N. brichardi forms together with the expres- 
sion investigated for N. pulcher, the N. brichardi /pulcher spe- 
cies complex, Duftner et al. 2007) genome sequences and 
sequenced amplicons of the corresponding regions for 
O. centralis to guarantee equal binding efficiencies between 
species. 

All used primers (see supplementary table S5 
[Supplementary Material online] for primer sequences) 
were initially validated on serial dilutions of factor 5 of 
A. burtoni juvenile whole body mixed cDNA (34 individuals 
of 7.8-24 mm standard length, 3 weeks to 2 months age, 
laboratory strain, equimolar pooled RNA) and for N. bri- 
chardi I pulcher specific primers on serial dilutions of factor 
5 of N. brichardi juvenile whole body mixed cDNA (six indi- 
viduals pooled, 18.4-24.6 mm standard length, 3 months age, 
kindly provided by H. Gante, University of Basel) starting with 
10ng/|il DNA final concentration. qRT-PCR efficiencies (E) 
were calculated for each reaction from the slope of the stan- 
dard curve using the equation E = io (_1/slope) as implemented 



in the StepOnePlus software (LifeTechnologies) with an effi- 
ciency of 2 being equal to 100% (E% = [10 (_1/slope) - 1] x 100) 
and an indicator of a robust assay. 

Expression was measured as comparative C T experiments 
using rp\7 (ribosomal protein L7) as control gene and gills as 
reference tissue. Rp\7 was chosen after an initial comparison 
with two other presumably house keeping genes (efla and 
beta-actin). rp\7 showed the most consistent expression be- 
tween samples (i.e., smallest C T range) and has been used 
previously in fish qRT-PCR experiments (Bonne et al. 2010). 
Gill tissue was chosen as reference for the delta-delta-C T 
method, because it did not show differences between sexes 
and expressed the genes of interest. Furthermore, expression 
in gills was most consistent between species and in compar- 
ison to liver and anal fin tissue. For quantification, data were 
first analyzed using the delta-delta-C T method (Livak and 
Schmittgen 2001) and custom R-scripts. As lowest threshold 
for expression we applied 37 cycles. As a second, and this time 
reference tissue independent, analysis method, mean normal- 
ized gene expression was calculated as described in (Simon 
2003) again using rp\7 as control gene. 

All experiments were carried out with three technical rep- 
licates. The details of all experiments including used samples, 
raw data, graphs, and statistics are given in supplementary 
materials S2 and S3, Supplementary Material online. 

Cyp19a1A Gene Sequencing 

A fragment including the entire coding region was amplified 
by PCR on genomic DNA of 28 cichlid species from LT. The 
species cover 12 tribes of the cichlid assemblage of this lake; 
the number of species sequenced per tribe was chosen ac- 
cording to species-richness of the tribe. When available, a 
male and a female specimen were sequenced (supplementary 
table S4, Supplementary Material online). 

Long-range PCR reactions were carried out using Phusion 
Master Mix (New England Biolabs) in 40 |il reaction volume 
on 50 ng genomic DNA with the primers for 5'-TGAACTAGG 
TCCTGTAAACCCAAGG-3' and rev 5'-AAGACTTTTGCACAG 
AACAGTAGG-3' (cycling conditions: 98 °C 30 s followed by 
35 cycles of 98 °C for 10 s, 64 °C for 30 s, 72 °C for 4 min and 
with a final elongation of 10 min at 72 °C, primers ordered 
from Microsynth). 

PCR products were visualized using SYBRSafe DNA Gel 
Stain (LifeTechnologies) on a 1.5% agarose gel. Thirty micro- 
liters of PCR product was gel purified using QIAquick Gel 
Extraction Kit (Qiagen). 100 ng of purified PCR product 
were used for barcoded library preparations using 
NEBNextFast DNA fragmentation kit (fragmentation time 
28 min) and Ion Xpress Plus Fragment Library kit and Ion 
Xpress Barcode Adapters (LifeTechnologies) according to 
manufacturer's recommendation. Libraries were pooled 
after barcoding for subsequent analysis. Quantification prior 
to sequencing was done with the Ion Library Quantitation Kit 
(LifeTechnologies) on a StepOnePlus Real-Time system 
(LifeTechnologies). 

Libraries were sequenced on three 314 chips on an Ion 
PGM Sequencer (LifeTechnologies). Demultiplexed sequences 
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were loaded into Geneious (Biomatters Ltd.) and mapped to 
the reference amplicon reconstructed from the A. burton] 
genome (BROAD) as closest available relative for all self-se- 
quenced species. Alignments were corrected manually. The 
subsequent sequence analyses were performed in Geneious 
using MUSCLE (Edgar 2004) as alignment algorithm. 
Alignments per species were combined to construct one con- 
sensus sequence per species with 75% identical threshold for 
ambiguous sites. Sequences for 0. niloticus, P. nyererei, and 
N. brichardi were retrieved from the cichlid genomes 
(BROAD) and included in the alignment. 

The final alignment based on the grouped nucleotide 
sequences was trimmed and used for the reconstruction of 
a phylogenetic tree with MrBayes (5 million generations, de- 
fault settings, version 3.2.1. , Ronquist et al. 2012) under the 
GTR + I + gamma model as determined with jModelTest2 
(Darriba et al. 2012). The obtained tree topology was in 
accordance with a maximum likelihood tree (1,000 boot- 
straps) obtained with PhyAAL (Guindon et al. 2010) (data 
not shown). An alignment covering only the coding region 
from start to stop codon according to the Nile tilapia 
sequence available in Ensembl (ENSONIT00000000198) was 
used for selection analysis. 

In Silico Sequence Analysis 

Sequences of the candidate genes were identified using the 
reciprocal best hit BLAST method with annotated sequences 
from the medaka (O. latipes) genome assembly (http://www. 
ensembl.org/Oryzias_latipes, last accessed July 23, 2013) as 
queries against the Nile tilapia (Oreochromis niloticus) EST 
data available in Genbank (www.ncbi.nlm.nih.gov, last 
accessed July 23, 2013) and, upon availability, against the 
Nile tilapia genome (http://www.ensembl.org/Oreochromis_ 
niloticus, last accessed July 23, 2013). The retrieved tilapia 
sequences were used as queries in a local BLASTn search 
against the cichlid fish genomes (BROAD) as references. If 
the reciprocal best hit method was not conclusive, especially 
to distinguish cichlid/teleost copies of the same original tet- 
rapod gene, phylogenetic analysis were carried out (data not 
shown) including more teleost sequences to characterize the 
A- and B-copies according to the annotation used for medaka 
in Ensembl (www.ensembl.org, last accessed July 23, 2013). 

In-frame alignments of the coding sequences from start 
codon (included) to stop codon (excluded for Selecton anal- 
ysis, see below) were performed with CodonCodeAligner 
(CodonCode Corporation). All alignments were checked 
and corrected manually with respect to exon-intron bound- 
aries and in-frame indels. 

Overall Ka/Ks estimates were calculated using 
KaKs_calculator (Zhang et al. 2006) with the MA method 
for each pairwise comparison to the Nile tilapia sequence 
over the entire length (supplementary tables S2 and S3, 
Supplementary Material online). 

Site-wise Ka/Ks estimates were performed with Selecton 
(Doron-Faigenboim et al. 2005; Stern et al. 2007) using the 
tilapia sequence as query. The M8 (M8, beta + w > 1) model 
enabled for positive selection was used (Yang et al. 2000). 



When positively selected sites were detected under this 
model, it was tested against the null model (no positive se- 
lection, M8a, beta + w= 1, Swanson et al. 2003). 

The distribution of positively selected sites in cypl9alA 
along the phylogeny was tested with HyPhy (Pond et al. 
2005) and codeml, which is part of the PAML package 
(Yang 2007). To test for branch-specific signs of positive selec- 
tion we run the Branch-site REL model (Kosakovsky Pond et al. 
2011) as implemented in HyPhy on www.Datamonkey.org 
(last accessed July 23, 2013) (Pond and Frost 2005; Delport 
et al. 2010). To test for site- and branch-specific signs of selec- 
tion, we used codeml with the branch-site model and a test for 
positive selection (Yang and Nielsen 2002; Yang et al. 2005; 
Zhang et al. 2005) to test different clades against all other 
clades of the phylogeny (fig. 2). 

Promoter Analysis 

Promoter analyses were carried out on the flanking regions of 
the cypl9alA and cypl9alB genes exported from all available 
teleost genomes on www.ensembl.org (last accessed July 23, 
2013) (Release 69, October 2012) and on the sequences avail- 
able for A. burtoni, P. nyererei, and N. brichardi (BROAD). 
Annotation files were exported from Ensembl or, whenever 
necessary, created manually for the nonannotated genomes. 
The annotation for gldnB on the Nile tilapia sequence was 
added manually, as this gene is not annotated in the cur- 
rent genome release. Annotation was done using the 
coding sequence present in Genbank (Accession number 
XM_003443944). Alignments for all teleost promoters were 
performed using mVISTA (Mayor et al. 2000; Frazer et al. 
2004) with Shuffle-LAGAN as alignment algorithm. 
Alignments including only cichlid sequences were performed 
with MUSCLE (Edgar 2004) as implemented in Geneious 
(Biomatters Ltd). Putative transcription factor binding sites 
were annotated according to (Tong and Chung 2003; 
Yoshiura et al. 2003; Chang et al. 2005; Ohmuro- 
Matsuyama et al. 2007; Wang et al. 2007; Diotel et al. 2010) 
and with the help of Matlnspector as implemented in the 
Genomatix software suite V2.6 (Genomatix Software GmbH) 
with a matrix similarity cutoff of >0.9. We selected only tran- 
scription factors that fell in the categories brain and/or CNSs 
and ovary, testis, and germ cells. Note that putative transcrip- 
tion factor binding sites that are described as "transcription 
factor -F" (e.g., Hbox-F) include more than three members of 
the same family predicted to bind at the same site and are 
thus abbreviated with -F for family. 

Supplementary Material 

Supplementary material S1-S3, tables S1-S5, and figures S1 
and S2 are available at Molecular Biology and Evolution online 
(http://www.mbe.oxfordjournals.org/). 
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